智能论文笔记

A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

Xiaoqing Tan , Chung-Chou H. Chang , Ling Zhou , Lu Tang

分类： (统计)机器学习 | 机器学习

2021-03-10

由于样本量有限，可以准确估计研究地点（例如医院）中的个性化治疗效果。此外，隐私考虑和缺乏资源阻止站点利用其他站点的主题级数据。我们提出了一种基于树的模型平均方法，以通过利用从其他潜在异质部位得出的模型来提高目标部位条件平均治疗效果（CATE）的估计精度，而无需共享主题级数据。据我们的最佳知识，没有建立的模型平均分布式数据的方法，重点是改善治疗效果的估计。具体而言，在分布式数据网络下，我们的框架提供了一个基于CATE估算器的基于可解释的树的合奏，该集合可以跨研究站点加入模型，同时通过站点分区积极地对数据源中的异质性进行建模。通过对氧疗法对医院存活率的因果影响的现实研究证明了这种方法的表现，并得到了全面的模拟结果的支持。

translated by 谷歌翻译

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models

Ziyi Chang , Edmund J. C. Findlay , Haozheng Zhang , Hubert P. H. Shum

分类：计算机视觉 | 人工智能

2022-12-16

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made significant advancements in this domain, they mostly consider motion synthesis and style manipulation as two separate problems. This is mainly due to the challenge of learning both motion contents that account for the inter-class behaviour and styles that account for the intra-class behaviour effectively in a common representation. To tackle this challenge, we propose a denoising diffusion probabilistic model solution for styled motion synthesis. As diffusion models have a high capacity brought by the injection of stochasticity, we can represent both inter-class motion content and intra-class style behaviour in the same latent. This results in an integrated, end-to-end trained pipeline that facilitates the generation of optimal motion and exploration of content-style coupled latent space. To achieve high-quality results, we design a multi-task architecture of diffusion model that strategically generates aspects of human motions for local guidance. We also design adversarial and physical regulations for global guidance. We demonstrate superior performance with quantitative and qualitative results and validate the effectiveness of our multi-task architecture.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Weak-signal extraction enabled by deep-neural-network denoising of diffraction data

Jens Oppliger , Michael M. Denner , Julia Küspert , Ruggero Frison , Qisi Wang , Alexander Morawietz , Oleh Ivashko , Ann-Christin Dippel , Martin von Zimmermann , Niels B. Christensen

分类：机器学习

2022-09-19

噪声的去除或取消对成像和声学具有广泛的应用。在日常生活中，Denoising甚至可能包括对地面真理不忠的生成方面。但是，对于科学应用，denoing必须准确地重现地面真相。在这里，我们展示了如何通过深层卷积神经网络来定位数据，从而以定量精度出现弱信号。特别是，我们研究了晶体材料的X射线衍射。我们证明，弱信号是由电荷排序引起的，在嘈杂的数据中微不足道的信号，在DeNo的数据中变得可见和准确。通过对深度神经网络的监督培训，具有成对的低噪声数据，可以通过监督培训来实现这一成功。这样，神经网络就可以了解噪声的统计特性。我们证明，使用人造噪声（例如泊松和高斯）不会产生这种定量准确的结果。因此，我们的方法说明了一种实用的噪声过滤策略，可以应用于具有挑战性的获取问题。

translated by 谷歌翻译

FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

Patrick H. Chen , Chang Wei-cheng , Yu Hsiang-fu , Inderjit S. Dhillon , Hsieh Cho-jui

分类：机器学习

2022-06-22

例如，近似K-Nearest的邻居搜索（AKNNS）现在已经在现代应用程序中变得无处不在，例如，作为一个快速搜索程序，具有两个塔式深度学习模型。特别是基于图的AKNN方法，由于其出色的性能，因此受到了极大的关注。这些方法依靠贪婪的图形搜索来遍历数据库中的载体。在这种贪婪的搜索方案下，我们进行了一个关键的观察：许多距离计算不会影响搜索更新，因此可以在不损害性能的情况下近似这些计算。结果，我们提出了手指，这是一种快速的推理方法，以实现有效的图形搜索。手指通过估计较低碱基和分布匹配的相邻残留向量之间的角度来近似距离函数。近似距离可用于绕过不必要的计算，从而导致更快的搜索。从经验上讲，在不同的基准数据集中加速了一种名为HNSW的流行基于图形的方法，其名称为HNSW的HNSW方法可超过现有的基于图的方法20％-60％。

translated by 谷歌翻译

QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation -- Analysis of Ranking Metrics and Benchmarking Results

Raghav Mehta , Angelos Filos , Ujjwal Baid , Chiharu Sako , Richard McKinley , Michael Rebsamen , Katrin Dätwyler , Raphael Meier , Piotr Radojewski , Gowtham Krishnan Murugesan

分类：计算机视觉 | 机器学习

2021-12-19

深度学习（DL）模型为各种医学成像基准挑战提供了最先进的性能，包括脑肿瘤细分（BRATS）挑战。然而，局灶性病理多隔室分割（例如，肿瘤和病变子区）的任务特别具有挑战性，并且潜在的错误阻碍DL模型转化为临床工作流程。量化不确定形式的DL模型预测的可靠性，可以实现最不确定的地区的临床审查，从而建立信任并铺平临床翻译。最近，已经引入了许多不确定性估计方法，用于DL医学图像分割任务。开发指标评估和比较不确定性措施的表现将有助于最终用户制定更明智的决策。在本研究中，我们探索并评估在Brats 2019-2020任务期间开发的公制，以对不确定量化量化（Qu-Brats），并旨在评估和排列脑肿瘤多隔室分割的不确定性估计。该公制（1）奖励不确定性估计，对正确断言产生高置信度，以及在不正确的断言处分配低置信水平的估计数，（2）惩罚导致更高百分比的无关正确断言百分比的不确定性措施。我们进一步基准测试由14个独立参与的Qu-Brats 2020的分割不确定性，所有这些都参与了主要的Brats细分任务。总体而言，我们的研究结果证实了不确定性估计提供了分割算法的重要性和互补价值，因此突出了医学图像分析中不确定性量化的需求。我们的评估代码在HTTPS://github.com/ragmeh11/qu-brats公开提供。

translated by 谷歌翻译

Multitask Prompted Training Enables Zero-Shot Task Generalization

Victor Sanh , Albert Webson , Colin Raffel , Stephen H. Bach , Lintang Sutawika , Zaid Alyafeai , Antoine Chaffin , Arnaud Stiegler , Teven Le Scao , Arun Raja

分类：机器学习 | 自然语言处理

2021-10-15

最近已被证明大型语言模型在各种任务集中获得合理的零射普通化（Brown等，2020）。它已经假设这是语言模型的隐式多任务学习的结果，在语言模型中的预押（Radford等，2019）。可以通过明确的多任务学习直接引起零拍常规化？为了以缩放测试这个问题，我们开发一个系统，以便轻松地将任何自然语言任务映射到人类可读的提示表单中。我们转换一组大量的监督数据集，每个数据集都有多个提示，具有不同的措辞。这些提示的数据集允许基准测试模型执行完全看不见的任务的能力。我们介绍了一个普拉克尔编码器 - 解码器模型（Raffel等，2020; Lester等，2021），覆盖各种任务。该模型在多个标准数据集中达到强大的零点性能，通常优于其尺寸的型号超过16倍。此外，我们的方法对来自Big-替补基准测试的任务子集具有强烈性能，优于其尺寸的6倍。所有提示和培训的型号都可以在https://github.com/ bigscience-workshop / protectsource / httpsource / https：//huggingface.co/bigscience/t0pp。

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Fine-Grained Hard Negative Mining: Generalizing Mitosis Detection with a Fifth of the MIDOG 2022 Dataset

Maxime W. Lafarge , Viktor H. Koelzer

分类：计算机视觉

2023-01-03

Making histopathology image classifiers robust to a wide range of real-world variability is a challenging task. Here, we describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) to address the problem of generalization for mitosis detection in images of hematoxylin-eosin-stained histology slides under high variability (scanner, tissue type and species variability). Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation with a training set enriched with hard negative examples and automatically selected negative examples from the unlabeled part of the challenge dataset. To optimize the performance of our models, we investigated a hard negative mining regime search procedure that lead us to train our best model using a subset of image patches representing 19.6% of our training partition of the challenge dataset. Our candidate model ensemble achieved a F1-score of .697 on the final test set after automated evaluation on the challenge platform, achieving the third best overall score in the MIDOG 2022 Challenge.

translated by 谷歌翻译

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Steven H. Wang , Antoine Scardigli , Leonard Tang , Wei Chen , Dimitry Levkin , Anya Chen , Spencer Ball , Thomas Woodside , Oliver Zhang , Dan Hendrycks

分类：自然语言处理

2023-01-02

Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.

translated by 谷歌翻译